scikit-learn: make_multilabel_classification

https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_multilabel_classification.html

Generate a random multilabel classification problem.

引数

n_samples

n_features

n_classes

n_labels

The average number of labels per instance

More precisely, the number of labels per sample is drawn from a Poisson distribution with n_labels as its expected value, but samples are bounded (using rejection sampling) by n_classes,

「サンプルごとのラベルの数はn_labelsのポアソン分布から期待値として取られる」

「しかしサンプルは（rejection samplingを使って）n_classesに束縛される」

and must be nonzero if allow_unlabeled is False.

「allow_unlabeled引数がFalseなら、nonzeroでなければならない」

allow_unlabeled

default=True

If True, some instances might not belong to any class.

「いくつかのサンプルはどのクラスにも属さない」

code:example.py

>> X, Y = make_multilabel_classification(random_state=1)

>> X0 # integerになっている

array([6., 0., 3., 5., 4., 1., 1., 0., 0., 0., 1., 0., 6., 0., 3., 0., 4.,

2., 2., 4.])

>> Y0

array(0, 0, 0, 0, 1)

>> X1

array([3., 0., 4., 4., 2., 4., 1., 1., 3., 0., 5., 2., 5., 3., 3., 3., 1.,

3., 6., 5.])

>> Y1

array(0, 1, 0, 0, 1)

>> counter = Counter(idx for labels in Y for idx, label in enumerate(labels) if label == 1) # TODO np.bincountの使用を検討

>> counter.most_common() # 0はclass_0、1はclass_1、2はclass_2、・・・を表す

(1, 61), (0, 54), (3, 42), (4, 26), (2, 5)